Intro

This markdown document provides a condensed overview of methods for doing spatial data analysis and map-making in R for the Battles Lab. I’m aiming for a broad-but-shallow introduction, with the idea that you can pursue individual topics more deeply if they’re relevant to your work. This document borrows heavily from the excellent free ebook Geocomputation with R by Robin Lovelace. I’ll editorialize with my own experience and add additional resources that have been helpful to me.

Pros and cons of R for spatial data analysis

Pros:

  • Scripting built into analysis vastly improves reproducibility of your work.
  • Keeping your whole analysis pipeline (aspatial data, spatial data, analysis, figure/map production) all in the R environment makes project organization easier.
  • I personally perfer the R / ggplot aesthetic for mapmaking.

Cons:

  • R packages generally not as graceful with big spatial datasets (e.g.  statewide 30m rasters, the whole region 5 FACTS database). The situation is improving (see packages terra and stars) but still requires being thoughtful about where/how the computer is storing and processing data in situations where you could just blindly plug and play in ArcGIS.

Packages and resources

Go-to packages:

library(here) # for easy filepathing
## here() starts at C:/Users/dfoster/Documents/spatial_intro
library(raster) # for raster data
## Loading required package: sp
library(sf) # for vector data
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'pillar'
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(ggplot2) # for mapmaking

library(tidyverse) # for data munging, including of sf objects
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'hms'
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## v purrr   0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x tidyr::extract() masks raster::extract()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
## x dplyr::select()  masks raster::select()

Other useful packages:

library(terra) # next-gen version of raster/sf, with a less developed guidance/support 
## Warning: package 'terra' was built under R version 4.1.2
## terra version 1.4.22
## 
## Attaching package: 'terra'
## The following object is masked from 'package:dplyr':
## 
##     src
## The following object is masked from 'package:purrr':
## 
##     simplify
## The following object is masked from 'package:tidyr':
## 
##     extract
## The following objects are masked from 'package:raster':
## 
##     adjacent, animate, area, boundaries, buffer, cellFromRowCol,
##     cellFromRowColCombine, cellFromXY, clamp, click, colFromCell,
##     colFromX, cover, crop, crosstab, crs, crs<-, distance, erase,
##     extend, extract, flip, focal, freq, geom, hasValues, init,
##     inMemory, interpolate, mask, modal, mosaic, ncell, ncol<-, nrow<-,
##     origin, origin<-, plotRGB, rasterize, readStart, readStop, rectify,
##     res, res<-, resample, RGB, rotate, rowColFromCell, rowFromCell,
##     rowFromY, setMinMax, setValues, shift, stretch, symdif, terrain,
##     trim, values, values<-, writeRaster, writeStart, writeStop,
##     writeValues, xFromCell, xFromCol, xmax, xmax<-, xmin, xmin<-, xres,
##     xyFromCell, yFromCell, yFromRow, ymax, ymax<-, ymin, ymin<-, yres,
##     zonal, zoom
# ecosystem but better abilities to handle big data
# https://rspatial.org/terra/pkg/index.html

library(stars) # next-gen version of raster, IMO not as developed as terra
## Loading required package: abind
# https://r-spatial.github.io/stars/

library(tmap) # another mapping package, particularly useful for data exploration
# https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html

I highly recommend these resources for doing spatial analysis in R. I’ll flag other relevant resources and packages in specific sections:

Spatial data management and data exploration

There are two main types of spatial data:

Vector Data

# load DX area of interest from a shapefile
dx_aoi = 
  sf::st_read(here::here('data', 'small', 'dxstudyarea.shp'))
## Reading layer `dxstudyarea' from data source 
##   `C:\Users\dfoster\Documents\spatial_intro\data\small\dxstudyarea.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335850.6 ymin: 4045386 xmax: 342623.8 ymax: 4051330
## Projected CRS: NAD83 / UTM zone 11N

The metadata tells us that there’s only a single feature (point, line, or polygon) with six “fields” (columns of data). You can print the metadata any time by calling the sf object:

# print metadata
dx_aoi
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335850.6 ymin: 4045386 xmax: 342623.8 ymax: 4051330
## Projected CRS: NAD83 / UTM zone 11N
##   OBJECTID Id gridcode Shape_Leng Shape_Area area_ha
## 1        1  1     1827   60099.38   17046296 1704.63
##                         geometry
## 1 MULTIPOLYGON (((336799.6 40...

Usefully this also tells us what the coordinate reference system (CRS) of the data is.

We can do a basic plot, which gives one panel per field:

plot(dx_aoi)

This isn’t that helpful though, and the tmap library is great for providing Arcmap style interactive maps, using a ggplot style syntax:

tmap::tmap_mode('view') # sets the viewer to interactive; the alternative "plot" 
## tmap mode set to interactive viewing
# used for standard figure-style maps

tmap::tm_shape(dx_aoi)+ # tell tmap what data to use
  tmap::tm_borders(col = 'blue')+ # draw just polygon borders with a blue color
# we could also have set col = colname, where colname is the name of a field in 
# the shapefile to color by some variable. altneratives are tm_polygon() to have 
  # fill, tm_dots() for point data, and tm_lines() for linear features
  tmap::tm_scale_bar() # add a scale bar

Now we have an interactive map with some nice baselayers for context. We will add additional layers later.

Often vector data will be stored in a geodatabase, which we can also read using sf:

# get the names of all the layers in the geodatabase
sf::st_layers(here::here('data', 'big', 'fire20_1.gdb'))
## Driver: OpenFileGDB 
## Available layers:
##              layer_name geometry_type features fields
## 1             firep20_1 Multi Polygon    21318     17
## 2            rxburn20_1 Multi Polygon     6412     16
## 3 Non_RXFire_Legacy13_2 Multi Polygon      864     16
frap_fires = 
  # note that you have to specify the layer name if reading from a geodatabase;
  # if you don't it will read in the first layer by default. 
  sf::st_read(here::here('data',
                     'big',
                     'fire20_1.gdb'),
              layer = 'firep20_1')
## Reading layer `firep20_1' from data source 
##   `C:\Users\dfoster\Documents\spatial_intro\data\big\fire20_1.gdb' 
##   using driver `OpenFileGDB'
## Simple feature collection with 21318 features and 17 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -373237.5 ymin: -604727.6 xmax: 539438.2 ymax: 518283.7
## Projected CRS: NAD83 / California Albers

Raster Data

Most raster data types are handled well by the raster package. You might also use terra or stars, especially for big datasets.

A single raster layer can be loaded using the raster function:

mmi = raster::raster(here::here('data', 
                                'small',
                                'mmi_sum5_sc408ss_2016.bsq'))

Again we can look at metadata easily:

mmi
## class      : RasterLayer 
## dimensions : 1777, 4501, 7998277  (nrow, ncol, ncell)
## resolution : 30, 30  (x, y)
## extent     : 213855, 348885, 4034985, 4088295  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : mmi_sum5_sc408ss_2016.bsq 
## names      : mmi_sum5_sc408ss_2016 
## values     : 0, 255  (min, max)
## attributes :
##         ID                                        category
##  from:   0              0: No Canopy Loss / No Disturbance
##   to : 112 112: Non-Processed Landcover Type / Unavailable

This raster has attributes, meaning that the actual cell data values (integers from 0 to 255) are mapped to categories. In this case, the cell data values map directly to % mortality:

head(levels(mmi)[[1]])
##   ID                           category
## 1  0 0: No Canopy Loss / No Disturbance
## 2  1                           1 % loss
## 3  2                           2 % loss
## 4  3                           3 % loss
## 5  4                           4 % loss
## 6  5                           5 % loss

And basic plotting is also easy:

plot(mmi)

Tmap again offers a nice interactive map:

tmap_mode('view')
## tmap mode set to interactive viewing
tmap::tm_shape(mmi)+
  tmap::tm_raster()+
  tmap::tm_scale_bar()
## stars object downsampled to 1592 by 628 cells. See tm_shape manual (argument raster.downsample)
## Warning: Number of levels of the variable "NA" is 113, which is
## larger than max.categories (which is 30), so levels are combined. Set
## tmap_options(max.categories = 113) in the layer function to show all levels.

Note that tmap has automatically downsampled the large raster to make it plot in a reasonable amount of time.

Multiple layers - a raster stack

Net CDF files are a common file format for raster stacks:

terraclimate = raster::stack(here::here('data',
                                         'big',
                                         'TerraClimate_vpd_2020.nc'))
## Loading required namespace: ncdf4
## Warning in .getCRSfromGridMap4(atts): cannot process these parts of the CRS:
## long_name=crs
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition

## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition

It didn’t like something about the projection here, let’s take a closer look:

terraclimate
## class      : RasterStack 
## dimensions : 4320, 8640, 37324800, 12  (nrow, ncol, ncell, nlayers)
## resolution : 0.04166667, 0.04166667  (x, y)
## extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +ellps=WGS84 +no_defs 
## names      : X2020.01.01, X2020.02.01, X2020.03.01, X2020.04.01, X2020.05.01, X2020.06.01, X2020.07.01, X2020.08.01, X2020.09.01, X2020.10.01, X2020.11.01, X2020.12.01

Seems OK. Note that this raster stack has 12 bands, one for each month.

plot(terraclimate) # plot them all

# plot individual layers
plot(terraclimate[[1]])

Common Data Munging

Reprojections and transformations

The basic metadata of either a raster or sf object will tell you the coordinate reference system. You can also extract it specifically:

# just the basics
dx_aoi
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335850.6 ymin: 4045386 xmax: 342623.8 ymax: 4051330
## Projected CRS: NAD83 / UTM zone 11N
##   OBJECTID Id gridcode Shape_Leng Shape_Area area_ha
## 1        1  1     1827   60099.38   17046296 1704.63
##                         geometry
## 1 MULTIPOLYGON (((336799.6 40...
# the full version
st_crs(dx_aoi)
## Coordinate Reference System:
##   User input: NAD83 / UTM zone 11N 
##   wkt:
## PROJCRS["NAD83 / UTM zone 11N",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["UTM zone 11N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-117,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     ID["EPSG",26911]]
mmi
## class      : RasterLayer 
## dimensions : 1777, 4501, 7998277  (nrow, ncol, ncell)
## resolution : 30, 30  (x, y)
## extent     : 213855, 348885, 4034985, 4088295  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs 
## source     : mmi_sum5_sc408ss_2016.bsq 
## names      : mmi_sum5_sc408ss_2016 
## values     : 0, 255  (min, max)
## attributes :
##         ID                                        category
##  from:   0              0: No Canopy Loss / No Disturbance
##   to : 112 112: Non-Processed Landcover Type / Unavailable
raster::crs(mmi)
## CRS arguments:
##  +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs

Note that the raster object has a much simpler representation for the CRS. The raster is using the old style proj4string representation of a CRS, while sf has been updated to the new much-more-explicit version. The switchover has been a big deal in the spatial open software community. In my experience, so far packages have been pretty clever about speaking either CRS language and converting when needed (sometimes with warnings). However, things may break at some point in the future, and google is your friend.

Reprojecting either a raster or a vector object can be very slow if the object is large and/or complex, so often you’ll want to try to crop / downscale / subset the data before trying to reproject it.

# load the DX plot data
dx_plots = 
  st_read(here::here('data',
                     'small',
                     'DX_plots_29orig.shp'))
## Reading layer `DX_plots_29orig' from data source 
##   `C:\Users\dfoster\Documents\spatial_intro\data\small\DX_plots_29orig.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 29 features and 18 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 336428.8 ymin: 4046051 xmax: 341506.8 ymax: 4050174
## Projected CRS: WGS 84 / UTM zone 11N
# test equality of the CRS
st_crs(dx_plots) == st_crs(dx_aoi) # false; one is wgs84 and another is NAD83
## [1] FALSE

We can reproject/transform the coordinate reference system, usually so that multiple objects are on the same system and we can have them interact.

# with vector data
# reproject the plots to match the aoi
dx_plots = 
  dx_plots %>%
  sf::st_transform(crs = st_crs(dx_aoi))

# now they're on the same CRS
st_crs(dx_plots) == st_crs(dx_aoi)
## [1] TRUE
# with raster data
vpd_small = 
  raster::projectRaster(from = terraclimate[[1]], crs = 'EPSG:26911')
## Warning in rgdal::rawTransform(projfrom, projto, nrow(xy), xy[, 1], xy[, : 3
## projected point(s) not finite
## Warning in rgdal::rawTransform(projto_int, projfrom, nrow(xy), xy[, 1], : 460
## projected point(s) not finite

The use of the EPSG string is usually the quickest and easiest way to define a projection under both the old and new systems, and the common CRSes are very google-able.

We also can set the coordinate reference system, without reprojecting. You want to be very careful doing this, and usually only use it when for some reason the data doesn’t know the CRS but you do:

dx_busted = dx_aoi
st_crs(dx_busted) = NA

dx_busted
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335850.6 ymin: 4045386 xmax: 342623.8 ymax: 4051330
## CRS:           NA
##   OBJECTID Id gridcode Shape_Leng Shape_Area area_ha
## 1        1  1     1827   60099.38   17046296 1704.63
##                         geometry
## 1 MULTIPOLYGON (((336799.6 40...
st_crs(dx_busted) = st_crs(dx_aoi)

dx_busted
## Simple feature collection with 1 feature and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335850.6 ymin: 4045386 xmax: 342623.8 ymax: 4051330
## Projected CRS: NAD83 / UTM zone 11N
##   OBJECTID Id gridcode Shape_Leng Shape_Area area_ha
## 1        1  1     1827   60099.38   17046296 1704.63
##                         geometry
## 1 MULTIPOLYGON (((336799.6 40...

Cropping and buffering

Cropping both rasters and polygons is much faster if you crop by a rectangular extent than if you’re trying to crop by a polygon. If you’re trying to crop a big dataset down to a more manageable size you almost always want to do a rough first pass using a rectangular extent, and then go back and get your fine clipping later.

Raster Cropping

# start with a big MMI raster
plot(mmi)

# set values greater than 100 to NA
mmi[mmi>100] = NA

# first do a rough rectangular crop to a buffered AOI; note st_bbox() to get the 
# rectangular extent of the polygon
mmi_aoi = raster::crop(mmi,
                       dx_aoi %>%
                         st_buffer(100) %>%
                         st_bbox())

# rough rectangular crop is fast
plot(mmi_aoi)

# now mask by the smaller raster by the polygon
mmi_masked = 
  raster::mask(mmi_aoi, st_buffer(dx_aoi,100))

# 
plot(mmi_masked)

The built in attribute data for this raster annoying map 0 and NA as the same color, but they are different in the underlying data.

plot(mmi_masked==0)

Vector Cropping and Buffering

# first, reproject the small AOI polygon to match the CRS of the big fire polygons 
# dataset; we do this even though we want to ultimately use the CRS of the aoi 
# polygon because its much faster to reproject the small dataset


frap_fires_clipped = 
  
  # start with frap fires
  frap_fires %>%
  
  # was getting a weird error message googling it leads to this stack exchange 
  # thread https://stackoverflow.com/questions/61404151/error-when-using-st-intersects-in-cpl-geos-binopst-geometryx-st-geometryy
  # which suggests using st_cast()
  st_cast('MULTIPOLYGON') %>%

  # rough rectangular crop; note that we first reproject the aoi polygon to 
  # match the CRS of the fire polygons (we change the AOI polygon because its 
  # smaller and simpler, and thus faster to reproject)
  sf::st_crop(
    dx_aoi %>%
      
      # buffer the aoi by 1000 units; the units for the current CRS are meters
      st_buffer(1000) %>%
      
      # transform the aoi
      st_transform(crs = st_crs(frap_fires))) %>%
  
  # reproject the clipped polygons to match that of the aoi
  st_transform(crs = st_crs(dx_aoi)) %>%
  
  # for nicer plotting, make year a numeric
  mutate(year = as.numeric(as.character(YEAR_)))
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
plot(frap_fires_clipped)
## Warning: plotting the first 9 out of 18 attributes; use max.plot = 18 to plot
## all

head(frap_fires_clipped)
## Simple feature collection with 6 features and 18 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 334848.3 ymin: 4044168 xmax: 343268.1 ymax: 4052491
## Projected CRS: NAD83 / UTM zone 11N
##       YEAR_ STATE AGENCY UNIT_ID FIRE_NAME  INC_NUM          ALARM_DATE
## 18473  2003    CA    NPS     KNP     GIANT 00000000 2003-07-28 17:00:00
## 18533  1926    CA    NPS     KNP    KAWEAH 00000000 1926-08-09 16:00:00
## 18537  1921    CA    NPS     KNP ELK CREEK 00000000 1921-07-04 16:00:00
## 18539  1922    CA    NPS     KNP  HOSPITAL 00000000 1922-08-12 16:00:00
## 18578  1947    CA    NPS     KNP   MORO CR 00000000 1947-08-04 16:00:00
## 18587  1926    CA    NPS     KNP  POTWISHA 00000000 1926-07-03 16:00:00
##                 CONT_DATE CAUSE COMMENTS REPORT_AC  GIS_ACRES C_METHOD
## 18473 2003-12-12 16:00:00     1                275   275.4718        1
## 18533 1926-08-18 16:00:00     7              86000 34357.7031        6
## 18537 1921-07-16 16:00:00     4               1600  1551.4668        8
## 18539 1922-08-16 16:00:00     4                550   667.1011        8
## 18578 1947-08-07 16:00:00     8                350   132.1309        8
## 18587 1926-07-09 16:00:00     3                500   881.0719        8
##       OBJECTIVE FIRE_NUM Shape_Length  Shape_Area
## 18473         2 00000048     5772.193   1114794.7
## 18533         1 00000001    80754.142 139040698.4
## 18537         1 00000001    13140.427   6278563.4
## 18539         1 00000003     8521.353   2699662.4
## 18578         1 00007017     4021.935    534714.9
## 18587         1 00000010     9902.657   3565571.5
##                                Shape year
## 18473 POLYGON ((342313.6 4048491,... 2003
## 18533 MULTIPOLYGON (((335790.3 40... 1926
## 18537 POLYGON ((336900.3 4045657,... 1921
## 18539 POLYGON ((341760.3 4045357,... 1922
## 18578 POLYGON ((342600.3 4046527,... 1947
## 18587 POLYGON ((340290.3 4044637,... 1926
tm_shape(frap_fires_clipped)+
  # add the fire polygons, using the year for fill color and transparency of 0.5
  tm_polygons(col = 'year', alpha = 0.5)+
  tm_shape(dx_aoi)+
  tm_borders('blue')

Raster operations

Raster algebra

If you have multiple rasters (or raster layers) and want to perform some per-pixel operation its pretty easy, as long as the rasters line up:

# get the average vpd value from january and february 2020
vpd_avg = 
  (terraclimate[[1]]+terraclimate[[2]])/2

plot(vpd_avg)

# TRUE if vpd in a pixel is above 2, FALSE otherwise
vpd_above_2 = terraclimate[[1]] > 2

plot(vpd_above_2)

Masking

We can ‘mask’ one layer by another.

# make a version of the vpd_avg raster, but only showing cells where 
# VPD in january is above 2
masked_avg = 
  raster::mask(x = vpd_avg,
               mask = vpd_above_2,
               maskvalue = FALSE)

plot(masked_avg)

Aggregating

We can downsample a numeric raster to make it coarser resolution and easier to work with:

aggregated_vpd = 
  raster::aggregate(x = terraclimate[[1]],
                    fact = 30)

plot(aggregated_vpd)

Terrain

DEMs are a common type of raster and there’s a built in function terrain for calculating all sorts of useful products from a DEM:

if (!file.exists(here::here('data', 'small','seki_dem.tif'))){
      
  # download the DEM
  seki_dem = 
        elevatr::get_elev_raster(locations = 
                                   as(dx_aoi %>% st_buffer(100), 'Spatial'),
                           prj = st_crs(dx_aoi)$proj4string,
                           z = 13)
      
  # crop the DEM
  seki_dem = raster::crop(seki_dem, dx_aoi %>% st_buffer(100))
      
  # save the DEM
  raster::writeRaster(seki_dem, 
                          filename = 
                            here::here('data',
                                       'small',
                                       'seki_dem.tif'))
      
  } else {
  
    # just load the DEM if its already downloaded   
    seki_dem = 
      raster::raster(here::here('data', 'small','seki_dem.tif')) 
    
  }

plot(seki_dem)

seki_terrain = 
  raster::terrain(seki_dem,
                  opt = c('slope', 'aspect', 'TPI', 'TRI', 'roughness','flowdir'))


plot(seki_terrain)

Raster algegra is useful for getting a southwestness index:

seki_aspect = 
  seki_dem %>%
  raster::terrain(opt = 'aspect',
                  unit = 'degrees')

# make a new raster where the value of every pixel is abs(current_value-225)
seki_southwestness = 
  abs(seki_aspect-225)

# for every pixel where the value is greater than 180, get a new value
# (which is 360 - current value of cells where the value is > 180)
seki_southwestness[seki_southwestness>180] = 
  360-seki_southwestness[seki_southwestness>180]

# for every pixel, divide the value by 180
seki_southwestness = seki_southwestness/180

plot(seki_aspect)

plot(seki_southwestness)

window

It’s also easy to do moving-window type analyses with the raster package:

seki_dem_avg = 
  raster::focal(x = seki_dem,
                
                # w defines the size of the window in pixels; see 
                # help(focal)
                w = matrix(nrow = 91, ncol = 91, data = rep(1, times = 8281)),
                
                # give each cell the mean value of all its neighbors; you can 
                # supply other functions here, including custom functions
                fun = mean,
                na.rm = TRUE)

plot(seki_dem)

plot(seki_dem_avg)

Vector operations

tidyverse operations

One of the nice things about the sf package is how nicely it plays with tidyverse style selection, mutation, summarization, etc.:

names(frap_fires)
##  [1] "YEAR_"        "STATE"        "AGENCY"       "UNIT_ID"      "FIRE_NAME"   
##  [6] "INC_NUM"      "ALARM_DATE"   "CONT_DATE"    "CAUSE"        "COMMENTS"    
## [11] "REPORT_AC"    "GIS_ACRES"    "C_METHOD"     "OBJECTIVE"    "FIRE_NUM"    
## [16] "Shape_Length" "Shape_Area"   "Shape"
# start with frap fires
big_frap_fires = 
  
  frap_fires %>%
  
  # keep only fires greater than 100000 acres
  filter(GIS_ACRES >= 100000)
  
plot(big_frap_fires['YEAR_'])

# get the total area in hectares of the big fires in each year
big_frap_fires_aggregated = 
  big_frap_fires %>%
  mutate(area_ha = GIS_ACRES * 0.404) %>%
  group_by(YEAR_) %>%
  summarise(area_ha = sum(area_ha)) %>%
  ungroup() 

fixing broken polygons

A lot of the time, weird errors about broken geometry can be fixed with one of two common tricks:

big_frap_fires_aggregated = 
  
  # start with potentially broken geometry
  big_frap_fires_aggregated %>%
  
  # buffer by 0
  st_buffer(0) %>%
  
  # st_make_valid()
  st_make_valid()

Those two together take care of most busted geometries.

Converting between polygons and rasters

From polygon to raster

Suppose we want a raster of the fire perimeters in our SEKI AOI:

frap_fires_clipped_raster = 
  raster::rasterize(x = frap_fires_clipped,
            field = 'year',
            y = seki_dem)

plot(frap_fires_clipped_raster)

See also the fasterize package for a faster more efficient version of this function, which can be very slow for large polygons.

From raster to polygon

We can also convert rasters to polygons:

plot(seki_dem)

seki_above_1600 = 
  
  # frequently, we want to aggregate a raster before converting it
  raster::aggregate(seki_dem, fact = 10) >= 1600

plot(seki_above_1600)

seki_above_1600_sf = 
  
  # convert to a SpatialPolygonsDataFrame
  rasterToPolygons(seki_above_1600, dissolve = TRUE) %>%
  
  # convert to an sf object
  sf::st_as_sf()
## Loading required namespace: rgeos
seki_above_1600_sf
## Simple feature collection with 2 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 335753.4 ymin: 4045219 xmax: 342728.1 ymax: 4051427
## Projected CRS: NAD83 / UTM zone 11N
##   layer                       geometry
## 1     0 MULTIPOLYGON (((340735.3 40...
## 2     1 MULTIPOLYGON (((342038.3 40...
tm_shape(seki_above_1600_sf)+
  tm_polygons('layer')

Conversion to / from data frames

From raster to data frame

seki_dem_df = 
  seki_dem %>%
  as.data.frame(xy = TRUE)

head(seki_dem_df)
##          x       y seki_dem
## 1 335757.2 4051423 1548.856
## 2 335764.9 4051423 1540.908
## 3 335772.6 4051423 1532.685
## 4 335780.2 4051423 1524.572
## 5 335787.9 4051423 1517.304
## 6 335795.6 4051423 1510.806

From sf to data frame

frap_fires_df = 
  frap_fires %>%
  as.data.frame() %>%
  select(-Shape) # usually is select(-geometry)

head(frap_fires_df)
##   YEAR_ STATE AGENCY UNIT_ID FIRE_NAME  INC_NUM          ALARM_DATE
## 1  2020    CA    CDF     NEU    NELSON 00013212 2020-06-17 17:00:00
## 2  2020    CA    CDF     NEU   AMORUSO 00011799 2020-05-31 17:00:00
## 3  2020    CA    CDF     NEU    ATHENS 00018493 2020-08-09 17:00:00
## 4  2020    CA    CDF     NEU   FLEMING 00007619 2020-03-30 17:00:00
## 5  2020    CA    CDF     NEU  MELANESE 00008471 2020-04-13 17:00:00
## 6  2020    CA    CDF     NEU       PFE 00014858 2020-07-04 17:00:00
##             CONT_DATE CAUSE COMMENTS REPORT_AC GIS_ACRES C_METHOD OBJECTIVE
## 1 2020-06-22 17:00:00    11              110.0 109.60250        1         1
## 2 2020-06-03 17:00:00     2              670.0 685.58502        1         1
## 3 2020-02-29 16:00:00    14               26.0  27.30048        1         1
## 4 2020-03-31 17:00:00     9               13.0  12.93155        1         1
## 5 2020-04-18 17:00:00    18               10.3  10.31596        1         1
## 6 2020-07-04 17:00:00    14               36.0  36.70193        1         1
##   FIRE_NUM Shape_Length Shape_Area
## 1     <NA>     3252.523  443544.67
## 2     <NA>     9653.760 2774464.03
## 3     <NA>     1649.643  110481.13
## 4     <NA>     1577.156   52332.11
## 5     <NA>     1035.788   41747.22
## 6     <NA>     2348.114  148527.45

Spatial data extraction

Extracting raster data at points

# get the elevation at each DX plot
dx_plots$elev_m = 
  raster::extract(seki_dem,
                dx_plots)

tm_shape(seki_dem)+
  tm_raster(palette = 'viridis', style = 'cont')+
  tm_shape(dx_plots)+
  tm_dots(col = 'elev_m')+
  tm_scale_bar()

Extracting raster data at polygons

# include df = TRUE so that we get a dataframe with polygon IDs
burned_elev = 
  raster::extract(seki_dem,
                frap_fires_clipped, 
                df = TRUE)

# the ID column gives the polygon ID, the row number in the polygons table; 
# there is one row per raster cell
head(burned_elev)
##   ID seki_dem
## 1  1 2012.342
## 2  1 2015.766
## 3  1 2020.600
## 4  1 2024.312
## 5  1 2026.531
## 6  1 2028.331

st_intersection()

Use st_intersection() (or less frequently st_difference() or st_sym_difference()) to get the intersection, difference, or symmetrical difference between sf objects. For example, if we want to flag the DX plots as above or below 1600m elevation:

# start with points and polygons
tm_shape(seki_above_1600_sf)+
  tm_polygons('layer')+
  tm_shape(dx_plots)+
  tm_dots()
# convert the 0-1 layer to a more clear TRUE/FALSE flag
seki_above_1600_sf = 
  seki_above_1600_sf %>%
  mutate(above_1600 = layer==1) %>%
  select(-layer)

# use st_intersection() to extract teh polygon values to the plot points
dx_plots = 
  dx_plots %>%
  st_intersection(seki_above_1600_sf)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
# now dx_plots has a column for above_1600, which was extracted from the polygons
head(dx_plots)
## Simple feature collection with 6 features and 20 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 336795.5 ymin: 4046051 xmax: 340315.2 ymax: 4049838
## Projected CRS: NAD83 / UTM zone 11N
##    Site Plot       Date       Crew Latitude_D Longitude_ Datum_Form   UTM_N
## 6  SEKI   13 2017-06-14 EK, DW, GV   36.57542  -118.7890      WGS84 4049264
## 19 SEKI   16 2017-06-22 LC, PM, DW   36.57546  -118.8241      WGS84 4049327
## 1  SEKI   10 2017-06-23 EK, KS, GV   36.54653  -118.7841      WGS84 4046051
## 2  SEKI    8 2017-06-22 EK, KS, GV   36.56500  -118.8017      WGS84 4048129
## 3  SEKI   26 2017-06-22 EK, KS, GV   36.54968  -118.8154      WGS84 4046453
## 4  SEKI   27 2017-07-02 EK, KS, GV   36.58047  -118.7974      WGS84 4049838
##       UTM_E Elevation_ Slope Aspect RootRot InPlot NearPlot Distance RootRotNot
## 6  339933.5   1595.018    25    106       N     NA       NA       NA         NA
## 19 336795.5   1592.000    51     72       N     NA       NA       NA         NA
## 1  340315.2   1588.008   102    290       N     NA       NA       NA         NA
## 2  338772.8   1595.628    58     10       N     NA       NA       NA         NA
## 3  337516.4   1603.553    74    290       N     NA       NA       NA         NA
## 4  339196.0   1803.502    33     80       N     NA       NA       NA         NA
##    OtherNotes   elev_m above_1600                 geometry
## 6          NA 1582.724      FALSE POINT (339933.5 4049264)
## 19         NA 1571.039      FALSE POINT (336795.5 4049327)
## 1          NA 1617.868       TRUE POINT (340315.2 4046051)
## 2          NA 1598.534       TRUE POINT (338772.8 4048129)
## 3          NA 1601.352       TRUE POINT (337516.4 4046453)
## 4          NA 1818.514       TRUE   POINT (339196 4049838)

Random or gridded sample points

Dropping gridded sample points over some area of interest is easy:

gridded_plots = 
  dx_aoi %>%
  
  # see the function help for options for a hexagonal grid, a grid based on 
  # corners rather than centers, or to return grid polygons instead of points
  st_make_grid(
    # alternatively, you can use the cellsize option to set the grid spacing
    n = c(10, 10),
    what = 'centers',
    square = TRUE,
  )

# by default this gives us a grid over the whole bbox, so use st_intersection 
# to only keep points in the aoi
tm_shape(dx_aoi)+
  tm_borders(col = 'black')+
  tm_shape(gridded_plots)+
  tm_dots('red')
gridded_plots = 
  gridded_plots %>%
  
  # take an inner buffer of 100m to avoid putting plots right on the boundary
  st_intersection(dx_aoi %>%
                    st_buffer(-100))

tm_shape(dx_aoi)+
  tm_borders(col = 'black')+
  tm_shape(gridded_plots)+
  tm_dots('red')

We can also place plots randomly, and even stratify them by polygon:

tm_shape(frap_fires_clipped)+
  tm_borders('red')
fire_plots = 
  frap_fires_clipped %>%
  st_sample(
    
    # or just 'size = 10' to sample 10 points distributed across all the polygons
    size = rep(10, times = nrow(frap_fires_clipped)),
    
    # can also be 'regular' or 'hexagonal'
    type = 'random',
    
    # if false, it'll give you almost the right number of points
    exact = TRUE
  ) %>%
  st_as_sf()

head(fire_plots)
## Simple feature collection with 6 features and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 342421.9 ymin: 4047263 xmax: 343138.4 ymax: 4048391
## Projected CRS: NAD83 / UTM zone 11N
##                          x
## 1 POINT (342653.6 4047992)
## 2 POINT (342421.9 4048391)
## 3 POINT (342794.8 4047263)
## 4 POINT (343138.4 4047582)
## 5 POINT (343018.3 4047932)
## 6 POINT (342484.3 4047330)
# this is a good example lfor how helpful the zoomable interactive map is for 
# checking out spatial data
tm_shape(frap_fires_clipped)+
  tm_borders('red')+
  tm_shape(fire_plots)+
  tm_dots('blue')

Making maps and figures

ggplot2 basics

For static maps like you’d use in a presentation or paper, I prefer ggplot over tmap, because I find it more flexible to get the details exactly the way you want them.

ggplot()+
  
  # add raster data; note that you have to extract it to a dataframe with x 
  # and y coordinates first; make sure the CRS matches that of the vector data 
  # you want to use so that things line up
  geom_raster(data = seki_dem_df,
              aes(x = x, y = y, fill = seki_dem))+
  scale_fill_viridis_c(option = 'C')+
  
  # add vector data
  geom_sf(data = dx_aoi,
          fill = NA,
          color = 'black',
          lwd = 2)+
  
  coord_sf()+
  theme_minimal()+
  
  # nice scale bar using ggspatial package
  ggspatial::annotation_scale()

Basemaps

Basic interactive basemaps are pretty easy to do with tmap, but getting nice basemaps working for a static plot like you’d use in a paper or presentation is not easy. Here’s an example using the package basemaps:

basemaps::basemap_ggplot(ext = st_buffer(dx_aoi, 100),
                  map_service = 'esri',
                  map_type = 'world_imagery')+
  geom_sf(data = st_transform(dx_aoi, 'EPSG:3857'),
          fill = NA,
          lwd = 2,
          color = 'yellow')+
  theme_bw()+
  ggspatial::annotation_scale()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank())
## Loading basemap 'world_imagery' from map service 'esri'...
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Fancy inset maps

See the references section.

Good sources of spatial data